{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Handling Missing Keys with `setdefault`\n",
    "\n",
    "This notebook contains example code from [*Fluent Python*](http://shop.oreilly.com/product/0636920032519.do), by Luciano Ramalho.\n",
    "\n",
    "Code by Luciano Ramalho, modified by Allen Downey.\n",
    "\n",
    "MIT License: https://opensource.org/licenses/MIT"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook, we show two ways to create a mapping of words and their occurrences. For each word in a text file, we create a list of tuples, one for each occurrence. The tuple values represent the position (line and column) of the word in the file. (Note that the line and column positions are indexed starting at one.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "import re"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "file_name = 'text.txt'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# temporary text file\n",
    "with open(file_name, 'w') as f:\n",
    "    f.write('Fluent Python notebooks\\nJupyter notebooks')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "WORD_RE = re.compile('\\w+')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# using `.get()` method\n",
    "index = {}\n",
    "with open(file_name) as fp:\n",
    "    for line_no, line in enumerate(fp, 1):\n",
    "        for match in WORD_RE.finditer(line):\n",
    "            word = match.group()\n",
    "            column_no = match.start()+1\n",
    "            location = (line_no, column_no)\n",
    "            # this is ugly; coded like this to make a point\n",
    "            occurrences = index.get(word, [])  # <1>\n",
    "            occurrences.append(location)       # <2>\n",
    "            index[word] = occurrences          # <3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fluent [(1, 1)]\n",
      "Jupyter [(2, 1)]\n",
      "notebooks [(1, 15), (2, 9)]\n",
      "Python [(1, 8)]\n"
     ]
    }
   ],
   "source": [
    "# print in alphabetical order\n",
    "for word in sorted(index, key=str.upper):\n",
    "    print(word, index[word])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# better solution\n",
    "# using `.setdefault()` method\n",
    "index = {}\n",
    "with open(file_name) as fp:\n",
    "    for line_no, line in enumerate(fp, 1):\n",
    "        for match in WORD_RE.finditer(line):\n",
    "            word = match.group()\n",
    "            column_no = match.start()+1\n",
    "            location = (line_no, column_no)\n",
    "            index.setdefault(word, []).append(location)  # <1>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fluent [(1, 1)]\n",
      "Jupyter [(2, 1)]\n",
      "notebooks [(1, 15), (2, 9)]\n",
      "Python [(1, 8)]\n"
     ]
    }
   ],
   "source": [
    "# print in alphabetical order\n",
    "for word in sorted(index, key=str.upper):\n",
    "    print(word, index[word])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This shows us that both blocks of code do the same things. However, we use less lines of code with the `.setdefault()` method and it's also more efficient, using a single lookup as opposed up to three with the `.get()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "os.remove('text.txt')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
